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Abstract 

The maximum independent set (MIS) problem is a well-studied combinatorial optimiza¬ 
tion problem that naturally arises in many applications, such as wireless communication, 
information theory and statistical mechanics. 

MIS problem is NP-hard, thus many results in the literature focus on fast generation of 
maximal independent sets of high cardinality. One possibility is to combine Gibbs sampling 
with coupling from the past arguments to detect convergence to the stationary regime. This 
results in a sampling procedure with time complexity that depends on the mixing time of the 
Glauber dynamics Markov chain. 

We propose an adaptive method for random event generation in the Glauber dynamics that 
considers only the events that are effective in the coupling from the past scheme, accelerating 
the convergence time of the Gibbs sampling algorithm. 


1 Introduction 

An independent set of a graph is a set such that no two nodes in the subset are connected by an 
edge. The maximum independent set (MIS) problem is to find the set of mutually nonadjacent 
nodes with the largest cardinality. MIS is a well-studied combinatorial optimization problem that 
naturally arises in many applications, such as statistical physics (where it is known as the hard 
core gas model) j7] 0], information theory [3] and wireless communication IH El- 

Generating independent sets is one of the key building blocks of the wireless CSMA [si mam m. 
The interference in a wireless network can be modeled by a conflict graph. The nodes are the links 
and there is an edge between two nodes if the corresponding communication links cannot transmit 
simultaneously. At each step of the protocol a set of the communication links is chosen that forms 
an independent set in this conflict graph. In queue-based CSMA, the nodes have weights that are 
the queue sizes at the wireless links. Ideally, one should compute a maximum weight independent 
set (MWIS). However, MWIS problem is NP-hard and hard to approximate in general [TBj. Many 
papers focus on finding good enough independent sets, see [H] for a more detailed discussion in 
the context of CSMA. In [13] the authors consider a message passing approximation algorithm 
for MWIS problem. They show that, if initialized using uninformative messages, their algorithm 
returns an optimal value if it converges. However, the convergence is proven only for the case of 
a bipartite graph with a unique MWIS. 

The focus of this paper is on the approximations to the MWIS problem using Glauber dynamics 
m over the space of independent sets of the interference graph. To simplify exposition, much of 
the analysis is focused on the special case in which all the weights are equal; extensions to the 
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completely general case is explained In Section 4 (at the end of Subsection 4.1.). Glauber dynamics 
are defined by a (reversible) Markov process that has as a stationary distribution a Gibbs measure 


r (A) = 


ycard(A) 

Zx : 


where A is an independent set of the graph, A is a parameter called fugacity, and Z\ is the normal¬ 
ization constant. For A = 1, this corresponds to the uniform distribution over all independent sets 
and when A goes to infinity, the Gibbs measure is concentrated on the maximum independent sets. 
For high values of A, the mixing time of this dynamics becomes prohibitively large. Furthermore, 
the existing bounds for the mixing time of graphs are limited to the bounded degree case mm- 

We combine the Glauber dynamics with the coupling from the past (CFTP) construction for 
stationarity detection of Markovian dynamics. CFTP is an exact simulation technique introduced 
by Propp and Wilson [15] . The original algorithm is computationally efficient only under some 
monotonicity assumptions on the Markovian dynamics. In the general case, the CFTP algorithm 
requires the construction of one trajectory for each initial condition, which is computationally 
intractable in most applications. Huber !.7j proposed a more general CFTP algorithm that is 
based on a construction of a bounding chain that avoids this dependence on the cardinality of 
the state space. However, this comes with a new penalty - the running time of the algorithm 
can be much larger than the mixing time of the original Markovian dynamics. Intuitively, many 
transitions have no effect on the bounding chain, inducing useless steps in the CFTP algorithm. 

The main contribution of this paper is a new CFTP algorithm that uses an adaptive event 
table and avoids generation of events that do not have any effect on the bounding chain. The idea 
of skipping passive events is very natural, but it is far from straightforward to see how this can 
be combined with the CFTP scheme without introducing a bias. The proof that our algorithm 
terminates in finite expected time and provides an exact sample from the stationary distribution 
of the Markovian dynamics is stated as our main result in Theorem 2. 

We illustrate the speed-up of Glauber dynamics for the independent sets on a toy example of a 
star network, for which we can derive bounds for the computation time of our algorithm. We show 
that, unlike the initial Glauber dynamics, our algorithm does not depend on A. Similar results 


are obtained numerically in Section 4.3 for the Barabasi-Albert model [Tj. We also compare the 
proposed algorithm against Dyer and Greenhill dynamics [5] that use a swap operation to speed 
up the convergence. 

The paper is organized as follows. An overview of the coupling from the past construction for 
the exact sampling from the stationary distribution of a Markov chain is given in Section [2] Our 
main contribution is presented in Section [3] We start in Section |3.1| by introducing the idea of 
active and passive events and the construction of a dynamic event table that contains only active 
events. Section [372] contains a detailed discussion on the skipping of events in the CFTP scheme, 
summarized in Algorithm 3. The validity of the algorithm is provided by Theorem 2. Section [4] 
contains the application to independent sets, while Section 5 concludes the paper. 


2 Coupling From the Past 

Throughout this section, S designates a finite state space. 

We study some properties of ergodic Markov chains over S, namely in the case of a joint 
distribution between multiple Markov chains. 

2.1 Mixing Time 

The mixing time is an indicator of how long it takes a Markov chain to forget its initial distribution. 
In essence, it measures how long Markov chain Monte Carlo methods must run before being “close” 
to the stationary distribution. 
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Let p and tt be two probability distributions over S. Recall that the total variation distance 
||-|| TV between the two is defined as 

Up - ttIItv = Ip (-4) -n{A)\. 


Definition 1 (Mixing time). Let M be the transition matrix of an ergodic Markov chain over S , 
with stationary distribution ir. For all x G S, let (X* (x)) ien be the Markov chain with transition 
matrix M and initial state x, and, for all i £ N, call pf the distribution of X* (x). The mixing 
time t m i x of M is defined as 


i G N 


max ||p? - 7T 



Notice that the mixing time is defined for transition matrices rather than Markov chains, as 
its definition considers all Markov chains generated by a given transition matrix. 

Property 1. Using the above notation, if, at instant i £ N, there exists an event A and a state 
x G S such that 

\p*(A)-n(A)\>\, 

then i < t m ix- 

This last property serves to derive lower-bounds on the mixing time of certain transition 
matrices. 


2.2 Coupling Time 

Just like the mixing time allows us to measure how quickly Markov chains converge towards their 
stationary distributions, the coupling time measures how long it takes before two or more Markov 
chains “meet” in a same state. 


Definition 2 (Coupling). Let K be a finite set of indices. For all k G K, let 

be a Markov chain over S. A coupling of the X k is a family of joint Markov chains 


x = nxn i& 


k£K 


defined over a same probability space such that, for all k G K, the marginal distribution of X k = 
is that o}X k . 

Definition 3 (Coupling Time). Let X be a coupling of Markov chains over S. We say X has 
coupled at instant i if all the X k , k G K are equal. 

We furthermore define the coupling time t of X as the first instant at which X has coupled, 


t = min {i G N | V (Ac, l) G K 2 , X k = X‘} , 
with the convention that min0 = +oo. 

Note that, unlike the mixing time, the coupling time is a random variable. 
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2.3 Markov Automata 


Markov automata are a convenient means of writing random mapping representations [5] when 
defining a coupling between Markov chains that have the same transition matrix. 

Definition 4 (Markov automaton). A Markov automaton is a quadruple A = ( S , A, D , ■), where: 

• S is a finite state space; 

• A is a finite alphabet; 

• D is a probability distribution over A; 

• ■ is an action by the letters of A on the states of S: 

fSxA ->S 
1 (x, a) i—>■ x ■ a 

We make the assumption that, for all a £ A, D (a) > 0. If not, it is possible to build a reduced 
Markov automaton satisfying this property by removing all the letters a £ A such that D (a) = 0. 

Let A* = UfceN^ fc be the set of finite words, A u = be the set of infinite words, and 
A°° =4‘U4“. 

For a word u £ A°° and for —oo < i,j < oo, we denote u^j the subword (zq,..., Uj) if i < j, 
or e if j < i. 

For convenience, we furthermore write S' • a for {a; • a | a; G 5} and x ■ U\^ n for x ■ U\ ■ ... ■ u n , 
such that S ■ iti-m stands for 

{x ■ u\ ■... ■ u n | x £ S} . 

Let A = ( S,A,D , •) be a Markov automaton, and ui_>.oo ~ D® N . For all x £ S, define 
X (at) = (Xi (z)) iGN by 

V? e N, Aj (x) = x ■ 

i.e. Xi (x) is the state reached when starting in x and reading ui _>j. 

Property 2 . For every x £ S, X ( x ) is a Markov chain, called the Markov chain generated by A 
and x. Furthermore, these Markov chains have the same transition matrix M 4 . 

The family X = (A' (x)) x£S is a natural coupling between these Markov chains, called the 
grand coupling of A. 

If there exists a word Ui_> n such that S ■ U\^, n is a singleton, we say that A couples, and call 
ui-m a coupling word. 

Property 3 ( j!2jf . If A couples, we have that M 4 is ergodic, and that 

P { lim |Ai| = l) = 1, 

Ki—>00 ) 

i.e. X a.s. has a finite coupling time. 

The reciprocal is not true: it is possible to construct a non-coupling Markov automaton A 
such that M 4 is ergodic. 

If A couples, we define its stationary distribution and mixing time as those of M_ 4 , and its 
coupling time r as that of X. The coupling time of a Markov automaton is closely linked to its 
mixing time, as shown in the following property. 

Property 4 (0). If A couples, the expected coupling time of A is lower-bounded by the mixing 
time t m j x of M a , i.e. 

*mix < E [r]. 

ft is important to underline that the distribution of the unique value of the grand coupling at 
the first moment of coupling is not distributed according to the stationary distribution of A | 6 ]. 
We now introduce an algorithm which uses the grand coupling of a Markov automaton to obtain 
that distribution. 
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2.4 Coupling from the Past 

Let A = (S, A, D , •) be a coupling Markov automaton and ~ -D® N . Define (Si) ieN by 

V* £ N ,Si = S- i 

and let r h be the first i for which Si is a singleton, T& is called the backwards coupling time of the 
Markov automaton. 

Theorem 1 ([12j). Using the above notations, we have that the unique element of S T b is a.s. 
distributed according to the stationary distribution of A, and that E [r fc ] = E[r], where r is the 
coupling time of A. 

This method for generating random variables according to the stationary distribution of a 
Markov chain is called coupling from the past (CFTP). The sequence U-oo^-i is called the gen¬ 
erating sequence. The corresponding algorithm is given in Algorithm [l] 


Algorithm 1 Coupling From the Past (CFTP) 
function CFTP((<S, A, D, •)) 

for s £ S do 
S (s) £- s 
end for 
repeat 

a £- Draw(-D) 

for s £ S do 

T (s) <— S (s ■ a) 

end for 

S T 

until |S (5)| = 1 

return UniqueElementOf(£ (5)) 

end function 


Property 5. The expected complexity of the CFTP algorithm is 0 (|<S|t 7 ), where 7 is the com¬ 
putation time of ■. 

The complexity is linear in |<S|, which can be very large. A workaround for this is to use 
bounding chains [7]. 

Definition 5. Let B be a subset of the power set of S, containing S, and let o : B x A —» B be an 
operator such that for all a £ A and B £ B, 

x£B=^x-a£Boa. 

The bounding chain of S ■ U\^ n induced by ( B , o) is the sequence S o u\^ n = S o u\ o ... o u n . 
Notice that, for any n £ N and Mi_, n £ A n , 

S • u,i—t n (Z S o ui—i n , 


hence the term bounding chain. 

Let ~ £)® n . For all * £ N, we have that 

S ■ C S o 

As a consequence, if there exists a word u _ r such that S o u_ ra _ > _ 1 is a singleton, then so is 
S ■ u-n-t—i, and they contain the same element. 
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From this, we derive a variant of the CFTP algorithm in which we iteratively compute 


Bi = S o 

until we obtain a singleton. The backwards coupling time r b of the bounding chain is then defined 
as the hitting time of the set of singletons. Note that the sequence 

(S o 

must now be recomputed at each iteration. This yields an overall complexity in O (r 2 r), where T 
represents the computation time of o, often small compared to 7 |<S|. 

The appearance of a quadratic dependency in r can be overcome by doubling the period at 
each iteration m- The algorithm is given in Algorithm [2j and has a complexity of 

0(tT). 


Algorithm 2 CFTP with Bounding Chains 
function Bounded-CFTP((<S, A, D, •)) 
«!<-£ 
k 4 — 1 

repeat 

w 4 - Draw (D® k ) ■ w 
B<-5ow 
k 4 — 2k 

until \B\ = 1 

return UniqueElementOf(B) 

end function 


3 Skipping 

Skipping was introduced in mi, in a form close to what we call here incremental skipping , as a 
means of speeding up the CFTP algorithm by avoiding certain “passive” events. It is introduced 
here alongside our own approach, oracle skipping. 

We first introduce these two methods on forward coupling chains, and show their computa¬ 
tional similarities. We then adapt oracle skipping to the CFTP algorithm, and give proof of its 
correctness. 

3.1 Oracle and Incremental Skipping 

Consider a forward coupling algorithm, with bounding chain B = and let be the 

corresponding sequence of letters: 


Vi £ N, Bi = S o ui-yi 


We say a letter a is inactive at instant i if Bi o a = Bi, and that it is active otherwise. Let 
be the set of active events at instant i. With these definitions, we consider a new bounding 
chain B° = (Bf) . gN such that 

Vi € N, Bf = S o v\ _h, 

with the vi-^oo drawn according to D conditioned to being active letters: 

Vi G N, Vi ~ D (• | &„!_*_!) . 

This method of generating a bounding chain is called oracle skipping. 


6 





Despite oracle skipping being easy to manipulate on a theoretical level, it can be difficult to get 
an efficient implementation of the algorithm. This is due to the cost of computing the conditional 
distribution at each iteration. 

The original skipping algorithm El, incremental skipping , provides a workaround for this: 
rather than recomputing the entire distribution of events at each step, the algorithm updates its 
distribution incrementally. 

Consider that the resulting bounding chain B x = (-Bf) is obtained from a word v[^ n , such 
that 

= Soi7'_*. 

Begin by setting D° = D. For all i G N, v[ is drawn according to D'. If v[ £ 2l„; set 
D l+1 = D , otherwise define D l+1 by removing v[ from the set of possible events: 

Va e A, D l+1 (a) = D l (a | a ^ v[). 

This process constructs the distributions (.D 2 ) i recursively by removing at most one event at 

each iteration. 

Though incremental skipping is more efficient in the case of complicated conditional distribu¬ 
tions, the complexity of the rest of the algorithm is greater than in the case of oracle skipping. 
Furthermore, it is very difficult to obtain theoretical bounds with incremental skipping. 

The following property justifies studying oracle skipping to derive bounds of coupling time of 
skipping algorithms, regardless of implementation. 

Property 6. Call To and tx the coupling times of B° and B x . We have that 


To <TI < M ■ TO: 


where M = |A| is the cardinality of the event set. 

Proof. Notice that Wi_>.oo can be obtained from by removing all of its passive letters, and 

v'l^oo can be obtained from rti-^oo by removing some of its passive letters. Doing so results in a 
coupling of the three bounding chains B °, B x and B such that there exists increasing functions 
(f> and ip from N to N satisfying: 

B ? = B 4,(i ) = B H<Ki))- 

The first inequality is a direct consequence of this, since v\^ To is therefore a subsequence of v\^ Tx . 

For the second inequality, notice that two active events in are separated by a sequence 

of pairwise different passive events, and the distance between the two is therefore at most M. This 
implies that v\^ TO contains at least one letter out of M from v\^ Tx . i.e. 


to > 


TL 

M~ 


This concludes the proof. 


□ 


3.2 CFTP with Oracle Skipping 

We now adapt oracle skipping to the CFTP algorithm. For an adaptation of incremental skipping, 
see I TT ]. 

The difficulty in implementing a CFTP algorithm with oracle skipping lies in the fact that, as 
we move backwards in time, the state of the system at a fixed instant —k changes. The event U-u 
can therefore start out as active, then become passive, then active again, etc. each time we go 
further back in time. Whereas removing events that have become passive is not difficult, a passive 
event that was removed and that ought to be active once more cannot simply be pushed back in; 
keeping the event in memory would imply drawing every event, which defeats the purpose. 

The solution adopted here consists in dropping passive letters completely, and inserting ac¬ 
tive letters according to an adequate distribution, one that preserves the dynamics of the initial 
bounding chain. We give the details of this algorithm, and prove its correctness. 
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a b c h b b a S o h 


h-contraction 



a b c b b b a 


(h 1 ^-expansion 


S o h 


abacchca 


So 




abacchca 

Active letters are shown in bold font. 


Figure 1: Contracting and Expanding 


To begin, we introduce a delimiter, denoted jj, used to split up our sequence of events, and two 
new operations: contraction, which consists in removing passive letters, and expansion, through 
which these passive letters are added back into a contracted word. 

Fix a Markov automaton A = (S,A,D,-). Let B be a subset of the power set of S and o be 
an operator, such that (B , o) induces a bounding chain for the grand coupling of A. Define the 
delimiter ft as a letter that leaves states unchanged: 

Vx £ S, x ■ jj = x and \/B £ B, B o jj = B. 

Let A$ = A U {ft}. For all q £ [0,1], call D q be the distribution over Aft such that: 

D q (U) = q and Va £ A , D q (a) = (1 - q) ■ D (a). 

Furthermore, for any subset S of S , let 

21 (S) = {a £ A | S ± S o a} U {jj} 


be the set of active letters and 


¥ (S) = {o £ A | S = S o a} 


be the set of passive letters. To simplify notations, we write 


21a!^ fc = 21 (5 o and ^ (S o ai_^) , 


as these are the active and passive letters of the bounding chain after having read a^k- 

Notice that jj is an exception: it never modifies the state of the chain, yet is considered to be 
an active letter nonetheless. 

We now define contraction and expansion. These are illustrated in Figure [l] 


Definition 6 (/i-contracting). Contraction consists in removing the passive letters in a word. For 
a given history h £ A|, call /i-contraction the operation c h : A°° —► A°° defined recursively by 
c h (e) = e and 


a ■ c h a ( u ) if a £ 2l; t 
c h ( u ) otherwise. 


Notice that contraction is idempotent. A word invariant under c h is called a h-contracted word. 













abacchca a b c b b b a 

' - .. - '--' 



c h (it) = c h ( V ) 
4. 

h 

U = V 


Active letters are shown in bold font. 


Figure 2: ft.-equi valence 


Definition 7 ((/t, g)-expanding). Expansion consists in inserting passive letters in a word. For 
a given history h £ AZ and q £ [0,1], call (h,q)-ex pansion the operation e q : A°° —» A°° defined 
recursively by e q (e) = e and 

h ^ ^ I p ■ e 1 ^ (a ■ u) with probability D q 

]a-e q (u) with probability D q fiAh), 

where p is a passive letter drawn independently according to 

D q (-1 qj/0, 

the distribution D q restricted to inactive letters. 

Note that, during expansion, the number of passive letters inserted before each letter in the 
initial word is geometrically distributed. 

Applying e q to a contracted word corresponds to constructing what the word “could have been” 
before it was contracted by c h , under the assumption that its letters were originally i.i.d. according 
to D q and that it ended with an active letter. 

Definition 8 (/i-equivalence). Let h £ AJ be a history, and u £ A^° and v £ A|° be two words, 

possibly drawn at random. We say u and v are h-equivalent, written u = v, if they almost surely 
have the same h-contractions. 

Two words are /i-equivalent if their contracted forms yield the same trajectories, as illustrated 
in Figure [2] Since contraction removes only passive letters, the words themselves give similar 
trajectories, but with pauses inserted at different moments, during which the trajectory is constant. 

Property 7. Let h £ AJ be a history, q £ [0, 1], and u £ A^°. We have that 

c h (it) = u and e q ( u ) = it, 

and therefore that 

e h q (c h {u))ku. (1) 

Additionally, if u £ A|, v £ and u = v, then 

Sohou = Sohov. (2) 
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Finally, under the same assumptions, 

c hu = c h ' v and e h q u = e h q v . (3) 

These results are straightforward, but will be used throughout the rest of this section to better 
analyse the effects of contraction and expansion. Note that the reciprocal of (2) is not true: two 
trajectories ending in the same state are not necessarily equivalent. 

We now justify the above statement that expansion somewhat reconstructs contracted words. 

Property 8. Let u ~ and h £ AJ. We have that e q ( c h (u)) ~ D® N . 

Proof. Let v = e q ( c h ( u )), l £ N and a\^i £ Aj. We have that 

i 

P {vi^yi = ai_y/} = []P {v k = a k | vi^k -1 = ai_s.fc_i} . 

fc=l 


Showing that, for all k £ N, 


P {'Cfc — O'k | ^ 1 — >k— 1 — Gi_ffc — l} — (Gfc) 


(4) 


would yield that v\^i ~ This being true for all l £ N, we would in turn have that 


v~Df*. 

We now show Q by differentiating the two cases where v k is or is not in tyh-a^k-i ■ 

P {v k = a fc | vi^ k -i = ai_>.fe_i} 

P ^yVk G/c Vk ^ ^1— >k— 1 ^1—1^ 

X P (ufc £ | ^ r l— >k— 1 

+ P ^ Cfc — Qfc V k £ ^1—k/c —1 — j 1 

X P '^U/ c £ 21 h-ai^k — l | ^1—vfc— 1 j 1 . 

Notice that v k is in ^h ai^k-i if anc i on ly if it was added during expansion. By definition, this 
occurs with probability D q so we have that 


P {v k £ ‘P/i-q 


Vl—tk—l — — D q ) 


and 


P | V\^,k—1 — — Dq (21 


Consider the case where zj*, is passive. It was inserted during expansion, and its distribution 
is therefore D„ 




' |v/c 


= Gfe 


Vk 


£ *$h-ai_,k — l ^1—kfc— 1 — 


— Dq (afc | . 

Similarly, if v k is known to be active, then it was already in u and was not removed during 
contraction. Its distribution was therefore D q conditioned to being active, i.e. D„ L : 

P — Gfc Vk £ 21 h-ai—,k—l f\vi^k — l — Gi—xfc — ij' 

= Dq (a k | 2(hai^fc-i) ■ 
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Combining all these results gives that 


P {v k = a k | vi_>fc_i = ai^k-i} 

= D q (dfc | Vfih ai^k-l) X Dq (^/i-oi-nb-i) 

+ Dq (a k | x Dq (f&-h-a±->k-l ) 

= Dq (ttfc) . 


This concludes the proof. □ 

Let u ~ D® n , and Ui_j.fi be the same word truncated after the first appearance of the letter jj. 
Call G q the distribution of ui_>.jj. 

Property 9. Let ui_>.jj ~ G q and h £ AJ. We have that ej {c h (ui_>.j|)) ~ G q . 

Proof. Let u ~ Z?® N such that ui-qi is u truncated after the first jj. 

By definition, )) is always active, and is therefore neither removed when contracting nor inserted 
when expanding. As a result, e* ( c h (ui_j.fi)) is ( c h (u)) truncated after the first jj. Combining 
this and the fact that, according to Property^! ( c h (it)) ~ D® N , we have that ej ( c h (iti-yj)) ~ 
Gq. □ 

This property justifies the claim that expansion corresponds to reconstructing (in distribution) 
a word that has been contracted, as this is indeed the case when the original word is drawn 
according to G q . 

For n £ N, consider a sequence of words (w m ) mg j 1 n j, independently distributed according to 
G 2 -™, and call Q n the distribution of 


We now define the (^-expansion of a word. The aim is once again to rebuild what a word “could 
have been” before it was contracted, supposing it was initially drawn according to Q n for some 
n G N. 

Formally, given a word v finishing with its nth j), we first split it into a sequence of words 
v n ■ ... ■ v 1 such that each v m contains exactly one j), which is its last letter. Notice that this 
decomposition is unique. We then define the t/-expansion of v as 

e G (v) = e e 2 -n (v n ) ■...■ ef.- vm+1 ( v m ) • ... • ef(v 1 ) , 


or e, if n = 0 . 

Property 10 . Let u ~ Q n . We have that 

eg (c e (u)) ~ Q n 


and 

eg (c e (u)) = u. 

Proof. Notice that, by definition of contraction, 

c h (vw) = c h (v) ■ c h ’ v (w) (5) 

for any finite words v, w and h. Let u n ■ n" -1 •... • u 1 be the unique decomposition of u such that 
each u m ends with its unique sharp. We have that 

eg (c e (u)) = eg (c e (u n ) ■... ■ c u (n 1 )) • 
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Since each c u + ( u m ) must also finish with its unique jj, the definition of eg gives that this 
is equal to 


The last line is a 


e\- n (c e (< u n )) •... • ej ( “ B) '-' C “ 

= e|_„ (c e K)) • ... • ef (w 1 )) 

= e^_„ (c e (w n )) • ... • e|”--“ 2 (c“ n ' -“ 2 (u 1 )) . 
consequence of (|3|) , since 

Vm G [1, n], c e (u n ■ ... ■ u m ) = u n ■ ... ■ u m . 



Let 


for all to G [l,n], such that 


W (u m )) 


u n -...-u m+1 ( ai 71 ■ ...-u rnJrl / m\ 

t; = e 9 -m c 


eg (c e (u)) = v n ■ ... ■ v m ■... ■ v 1 . 


By definition, every u m is distributed according to G 2 -™- Property [9] therefore gives us that every 
v m is also distributed according to G 2 -™. , which in turn implies that v is distributed according to 

Qn- 

Using (jTJ) , we also have that, for all m G [l,n], v m and u m are (u n ■... ■ u m+1 ) -equivalent, 
that is to say 


c u ■...■« - =c u :.,U - 


By concatenating and merging these using ([5]), we obtain that 


c e (v 71 -...^ 1 ) =c e (u n -...-u 1 ) 


i.e. eg (c £ («)) = u. □ 

Consider a sequence of words (w m ) mgN , independently distributed such that for all to G N, 
u m ~ G 2 -m. Define the sequence w n recursively such that w° = e and 

w n+1 = c e (u n+1 ■ eg (w n )) . 

This is the basis for our CFTP algorithm with oracle skipping: if w n is not a coupling word, 
then compute w n+1 , repeating the operation until a coupling word is found. 

Property 11. Using the above notation, we have that, for all m < n, S o w n C S o w m . In other 
words, the trajectories obtained at each iteration are embedded in one another. 

Proof. It is enough to show that, for all n G N, 

Sow n+1 CSow n . 


Since w n+1 and u n+1 ■ eg ( w n ) are e-equivalent, ([2]) gives us that 

5 o w n+1 =So u n+1 o eg (w n ) 

CSoeg (w n ), 

the inclusion being a direct consequence of the fact that S o u n+1 C S. Since w n is its own 
e-contraction, Property 10 gives that w n and eg ( w n ) are also e-equivalent, and § yields that 

S o eg (w n ) = S o w n , 


which concludes the proof. 


□ 
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Let 


N = inf {n € N | |5o W n | = 1 } 
be the first iteration at which a coupling word is found. 

Theorem 2. Using the above notation, we have that, if there exists a coupling word for the 
bounding chain of A, then 

P {TV < +00} = 1, 

and 

E [TV] < +00, 

i. e. the algorithm almost surely terminates, and does so after a finite expected number of iterations. 

Furthermore, The unique element of S o w N is distributed according to the stationary distribu¬ 
tion 7T of A. 

Note that, due to Property [TT] for every n > N, S o w n is a singleton that contains the same 
state, distributed according to n. 

Proof. The proof is fundamentally the same as that of Theorem [l] 

For the first part of the theorem, let u G A* be a coupling word for the bounding chain of A, 
and P u = D® H ( u) be the probability of drawing a word with prefix u. At each iteration, the 
probability of u k having u as a prefix is the probability of there being no jj in the first |u| letters, 
i.e. having |it fc | > |u|, times the probability of the prefix being u knowing there are no jj, i.e. P u . 
We therefore have that 


P {u prefix of u k } = P {|w fe | > |u|} P u 

> p {K| > M} p u 



> 0 . 


This implies we will almost surely draw a word u k that couples, at which point w k will also couple 
and the algorithm will stop. Furthermore, the expected number of iterations is finite, as it is 
upper-bounded by 

2 M 

iV 

We now show that the output state is distributed according to 7 r. Call the unique element 
of S o w N . If we can show that, for all e > 0 and all x G S, 

l p i^out = a;} — 7 T (a?)| < e, ( 6 ) 


then we are finished. 

Decompose eg (w N ) as 

eg (w N ) = u N ■ Jj • /- 1 • jj • ■ • • • ft • u 1 ' tt, 
with u m G A* for all to. Let 0 ~ D® z ~ , and 


W — 00 —>-0 — c 


.N 


.N -1 


We begin by showing that vj-oo-to ~ D® 2 ‘ . 

Consider a word ~ D® z and a family of random, independent instances (A: m ) me j 1 

such that, for each to, k m is distributed according to a geometric distribution of parameter 2 -m . 
Let l m = YaLi f° r a H m e [0, TV]. 
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Since u m ■ ft ~ G^-™ for all m , the lengths of the u m are equal to the position of the first j) in a 
sequence of letters i.i.d. according to Df^ m , minus one. This is exactly the geometric distribution 
of parameter 2 -m . In particular, we have that 


and therefore 


W-oo^0 = W-oo-s-0 ' u N ■ ... ■ u 1 

V— oo— l N * V— Zat+1—>• — In-i ’ • • • ' ii+1 —>—lo 

— oo —>-0 

^ N 


Let l = |tt w • u N 1 • ... • it 1 1 , such that 

~ N N-l 1 

W-Z+1^0 =U ■ u ■ ...-u . 


Notice that 


{ x out} = SowN 

= S o eg ( w N ) 

= S ow_ l+1 ^ 0 . 

We now show that if, for some k £ N, W-k ^o is a coupling word, then S o W-k-y o = {^outl- 
If k > l, then 

Rmtl = Sow_ l+1 ^ 0 

= So ui_ l+1 ^_ k _ 1 o w_ k ^o 
CSo w_ k ^ o, 


and if k < l 7 then 

S o ui-k^o = S o w-k^-i o w-i + i_>.o 
C5o w-i+ 1^,0 
= {^outi• 

In both cases, if W- k ^ o is a coupling word, then Sow_ k ^o is a singleton, and therefore necessarily 
equal to {a; out }. 

We now show (| 6 |. Since the bounding chain couple a.s., there exists t £ £N such that 
P { |>S O Wo O W_l O ... O W-t e | = 1} > 1 — £, 
and since wq ■ W -1 • ... • w-t e has the same distribution as vi-t e -y o, we have that 

P {|<S o w_ te ^o\ = 1} > 1 - e. 

Let Y = y ■ w_ te ^ o> with y ~ 7 r. Notice that Y G S o w _ te ^ 0 , and that the letters in w_ te ^, 0 
are i.i.d. according to D , such that Y ~ tt. 

If 5 o = 1, then S o ui_ te ^ 0 = {x ou t}, i.e. Y = a; ou (;. We therefore have that 

f* {s-out 7 ^ f ^ P { |5 o W-t e -y o | > 1} < £) 

and thus 

Vx G S, |P {x oni = x} - P {Y = a:}I < £. 

This is precisely ©■ □ 
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The aim of skipping is to keep only the contracted words, and avoid the expanded ones. Indeed, 
the extended words correspond to what the initial CFTP algorithm stores. Notice that when 
computing w n+1 , w n is expanded and then contracted. In practice, it is possible to combine these 
operations so as to only insert letters that are now active, rather then all “potentially skipped” 
letters. Contracting then only removes letters that were active at the previous iteration. This 
ensures that the size of the word (and therefore the time spent reading it) remains that of the 
contracted form. 

We implement this algorithm using First In First Out queues to represent words. It is given 
in Algorithm [3] Note that there is often an ad hoc means of computing the different sets of active 
letters dynamically, rather than recomputing them at each iteration. 

We now give an important result for computing some upper-bounds on the computation time 
of our algorithm. 

Property 12. Using the previous notation, call Tq the coupling time of the bounding chain of A, 
and Tq = Iw^l coupling time of the corresponding CFTP algorithm with oracle skipping. If 

D® k+1 {a k+1 £ (7) 


is increasing in k, i.e. passive letters become more likely as time passes, then 


E [tq] < 2 • (e [N] + E 



The proof is the same as for the initial CFTP algorithm, with the following two exceptions: 


• Since the algorithm computes transitions beyond the coupling of the bounding chain, it is 
important to make sure the proportion of active events after coupling does not exceed the 
proportion during coupling. This is ensured by condition |7]). 

• For the CFTP algorithm with oracle skipping, we introduce a delimiter jj in our coupling 
word, which is not present in the initial bounding chain. This delimiter is present N times 
in the final coupling word, hence the E [TV] term to account for this. 


Notice that, by construction, the number of times the algorithm goes back in time is the same 
as with normal CFTP. Since jj is added exactly once every time the algorithm does so, we have that 
E [TV] is equal to E [log 2 (r fc )], where r b is the backward coupling time of the algorithm without 


skipping. The overall complexity of the algorithm is therefore in O ^E 


'O 


so long as this does 


no better than O (E [log 2 (r b )]). 

This serves as a motivation to study the average forward coupling time of the Markov automa¬ 
ton with oracle skipping, which can serve as a means of estimating the average coupling time of 
the CFTP algorithm with either oracle or incremental skipping. 


4 Independent Sets 


Let G = (V, E) be a simple undirected graph. A subset I of V is called an independent set if no 
two vertices in / are connected by an edge, i.e. if 

Vx,y £ I, (x, y) £ E. 


Let I be the set of independent sets of G and, for any vertex v £ V, denote N (v) the set of 
neighbors of v, that is to say the w £ V such that ( v , w) £ E. 

We study the performance of the CFTP algorithm with oracle skipping when sampling inde¬ 
pendent sets according to the distribution 


Px (I) = 


A^I 


A £ 


focusing on the case where A is very large. Due to Property 12 we restrict our analysis to the 
complexity for the forward coupling. 
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Algorithm 3 CFTP with Oracle Skipping 


function Oracle-CFTP(A = ( S, A, D, •)) 
n t— 0, w <— [] 

[] is the empty queue 

repeat 

iNCREMENT(n) 

(B,w) -f- Double-History(A, w, n) 

until \B\ = 1 
return ElementOf(B) 

end function 

function Double-History(„4, w, n) 
v 4— w, m 4— n 

(£+,21+, u) <- G-Word(A, 2“ m ) 

Compute the contracted u n 

B~ 4— S 

B + and B~ are the new and old bounding chains 

21“ 4— 21 (5) 

Decrement(to) 
while NotEmpty(i)) do 

Expand and contract w n ~ 1 

21 ^ 21+ U 21“ 

Step 1: Expanding 

a 4- DRAW(D 2 - m | 2( ) 

Try to expend by one letter... 

if a € 21“ then 

... but keep it only if it is passive 

a <r~ Pop(i>) 

Otherwise, take the next letter in v... 

(£.21 ) (B“,r)oa 

... and update the previous chain 

end if 

if a £ 2l + then 

Step 2: contracting 

PusH(a, u) 

Only keep active letters... 

(£+, 21+) 4- (£+,2t+)o a 

... and update the new chain 

if a = jj then 

Decrement (to) 

Probability of seeing ft has changed 

end if 
end if 
end while 
return (£+ ,u) 
end function 

function G-Word(A,p) 

[] is the empty queue 

B 4— S 

Bounding chain 

21 1— 21 (5) 

repeat 

a 4- Draw(L> 9 a ) 

PuSH(a, u ) 

(£, 21) <- (£, 21) o a 

until a = j) 
return (£, 21, u) 
end function 
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4.1 Sampling algorithms 

We compare the coupling time our sampling algorithm with oracle skipping with two other ap¬ 
proaches described in 7]: Gibbs sampling and the Dyer-Greenhill chain j2j- 

4.1.1 Gibbs sampling 

Let us first define a Gibbs sampler for P\. At each iteration, independently draw a vertex v 
uniformly at random and u uniformly over [0,1]. 

• If it > then remove v from I if v £ I, otherwise do nothing. 

• If 0 < u < yyy, then add v to I if N (v) D I = 0, otherwise do nothing. 

This dynamic allows us to use Monte Carlo and CFTP methods to generate independent sets 
according to P\. The CFTP approach can be greatly improved by using the following bounding 
chain for the Glauber dynamic defined in |7j. 

Consider a family of independent sets ACT. Set 

B = n i£aI, D = (Uj e A-0 \ B 


and 

C = n IeA {V \I) = V — B — D. 

We have that 

A C {I g X\B C I C B U D} = (B, D). 

In other words, B is the set of vertices common to every independent set in A, C is the set of 
vertices that are in none of the independent sets of A, and D is the set of vertices that are in some 
but not all of the independent sets of A. The couples ((Bj, Dj)) ieN define a bounding chain for 
the Glauber dynamic (Ai) igN . 

The Gibbs sampler for the bounding chain is defined as follows: at each iteration, independently 
draw a vertex v uniformly at random and u uniformly over [0,1], Suppose the initial state is (B, D ), 
and write B + v for B U {u} and B — v for B \ {u}; the arrival state {B',D') is constructed as 
follows: 

• If it > xpy, then B' = B — v, D' = D — v. 

• If 0 < u < yfpj-, then: 

— if N(v) C C. then B' = B + v and D' = D — v, 

— if N(v) fi B = 0 but N(v) D D ^ 0, then D’ = D + v (v was necessarily in C U D), 

— otherwise do nothing (v was necessarily in C). 

4.1.2 The Dyer-Greenhill scheme 

The coupling time of the above bounding chain can be reduced through the Dyer-Greenhill scheme. 
The main idea is to enable two elements in the independent set to swap positions. Given p s € [0,1], 
if, in the Gibbs sampler, an attempt to add v to the independent set I fails due to the presence of a 
unique neighbour u already in /, then with probability p s , the independent set becomes I + v — u. 
A bounding chain can easily be defined for this new scheme. 
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4.1.3 Oracle skipping scheme 


Now consider oracle skipping for the bounding chain of the Gibbs sampler. For each vertex v, we 
have two events: adding v to /, denoted a v , and removing v from /, denoted r v . The active events 
are: 


• the r v for which v (f C, 

• the a v for which v G C and N (v) Pi B = 0, 


• the a v for which v G D and N ( v ) C C. 


Let V r and V a be the set of vertices for which removal and addition are respectively active 
in (B,D). For the Gibbs sampler, events are drawn according to the conditional distribution by 
picking an event uniformly at random in V z , where z = a with probability A \ V a \ / (A \ V a \ + |Vj.|), 
and z = r otherwise. 

For a vertex v G V, the fact that a v and r v are active is only modified when v, or one of 
its neighbours, is modified. It is therefore possible to locally update the conditional distribution 
at each iteration by simply updating the “activeness” of events for the modified vertex and its 
neighbours. This justifies using oracle skipping rather than incremental skipping in this context. 

Note that those three samplers can be adapted to the case of weighted vertices and product- 
form stationary processes of the form 


Pa(I) 


1 

Za 


nw, 

v£l 


where A = (X v ) v ^v is a weight-vector of the vertices. For the Gibbs sampler, A is replaced by the 
A(u) of the selected vertex. The other samplers are modified accordingly. 


4.2 Star graph 

In this paragraph, we study the graph 

G n = ([0, n], {(0, i), i G [l,n]}), 

called star graph. We focus mainly on the performance of the oracle skipping scheme for large 
values of A, such as when A n. The independents of this graph are 

Z = {{0}}UP([l,n]). 

First, we consider the coupling time r of the Glauber dynamic of the bounding chains without 
skipping, both in the case of the Gibbs sampler and of the Dyer-Greenhill sampler. 

Since at most one vertex is removed from D at each iteration, and the algorithm finishes when 
D = 0, this coupling time is lower bounded by the hitting time of 

(B,{0})U{B,D), 0 iD. 

Furthermore, since no vertex can be added to B so long as D contains both 0 and an element in 
[l,n], B = 0 until one of those states is reached. 

In the case of the Gibbs sampler, if A > 1, the expected hitting time of (0, {0}) is 0(A"). 
Furthermore, before reaching this state, the probability of removing 0 from D is exactly ( - \ +1 j L ( n+1 ) 
at each time step. For n large enough, this gives 

E[r] > (n + 1)(A + 1). 

For the Dyer-Greenhill sampler, the coupling time t dg is greatly reduced: the expected hitting 
time of (B,D), 0 ^ D is constant: 

„ A+ln+11 
A n p s 
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This is due to the fact that the first attempt to swap a vertex other than 0 will immediately 
remove 0 from D, since it is the only neighbour of the selected vertex. 

On the other hand, for the bounding chain to couple, every vertex must be selected at least once 
for addition or removal. As at each step, the modified vertex is chosen uniformly and independently 
at random, this gives that 

E [t dg ] > nlnn + 0(1). 

Now, let us consider the (forward) coupling time t° of the coupling chain with oracle skipping. 
The coupling time is at most the hitting time of (H,0). We have two main steps to consider: 

1. The hitting time of a state (B, D) where 0 ^ D\ 

2. The hitting time of (B, 0). 

Let us first focus on the hitting time of ( B , D) where 0 £ D. Consider the following birth- 
and-death process on [0,n], where state i represents the cardinal of C, until 0 is added to C. In 
state i, the active events are the addition of vertices in C and the removal of vertices in T. As a 
consequence, the probabilities Pi t i +1 and Pi+ip to go respectively from state i to state i + 1 and 
from i + 1 to i are 




n — i 
n — i +i\ 


and Pi+i,i 


(i + 1)A 

n — i — l + (i + 1)A 


( 8 ) 


As we assumed A > n, computations show that the stationary distribution 7r of this birth-and- 
death process satisfies, for all i £ [0,n], 


n(i) 



A-b-D 



7r(0). 


so tt(0 ) > \ (1 + A) 

The bounding chain can be bounded by the following process: when in state 0 only, vertex 0 
can be removed with probability l/(n + 1) (all events are active for removal, none for addition). 
Then the expected time T\ for reaching a state ( B 1 D) where 0 ^ D is (when A > n) 


E[n] 


(n + 1) 
tt(0) 


< 2e(n + 1). 


For the second step, consider the birth-and-death process on [0, n] where state i represents the 
(B, D) such that \B\ = n — i and 0 ^ D. For i > 0, * vertices are active for addition and at least 
n — i are active for removal. The transitions probabilities are exactly the probabilities p l . j defined 
in Eq. ([s]). 

Simple computations show that the hitting time of state 0 from state n satisfies 


E[t 2 ] 



+ n < n + e n/x = 


n + 0(1). 


Finally, note that in state n, vertex 0 is active for addition, and in case this event is generated 
(which happens with probability ^L_), we have to take into account the return time from the first 
step (0 G D) to the second step (0^0). By the Markov inequality, the probability that state n 
is visited again before state 0 is at most = \~ n . 

As a consequence, the expected coupling time satisfies 


E 



< E [n] + E [t 2 ] + 0(1) < (2e + 1 )n + 0(1). 


Notice that the coupling time does not depend on A and is linear in n. It therefore does better 
then the other samplers presented above. 
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4.3 Numerical experiments 

We now do an experimental comparison of the three samplers described in Section |4.1| for two 
models: the star graph, that has been precisely analysed in Paragraph |4.2[ and the Barabasi- 
Albert model 1J. 

Star graph We performed experiments for a star graph with 100 vertices and for different values 
of A. For each value of A and each sampler, 1000 experiments have been performed, and the average 
number of transitions computed is depicted in Figures [3j 
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Figure 3: Number of events generated by CFTP algortihms for the star graph with 100 vertices 
for different values of A. 


The first remark is that both Dyer-Greenhill and oracle skipping samplers outperform the 
Gibbs sampler. Second, the Dyer-Greenhill sampler seams insensitive to the value of A, which 
conforms to the bound nln?r + 0(1) given in Section 4.2 Finally, the oracle skipping scheme is 
always the most efficient algorithm. It is noticeable that the number of event generated decreases 
with A. This can be explained the following way: large independent sets are favored when A grows. 
Then, after reaching the independent set [l,n], whose probability grows with A, the probability 
that the next event is active is less than 1/(1 + A). As a consequence, many events are skipped. 

The difference in behavior between the Dyer-Greenhill and oracle skipping samplers is more 
obvious with the star graph with 1000 vertices, as depicted in Figure [2] (100 experiments are run 
for each value of A). 
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Figure 4: Number of events generated by CFTP algortihms for the star graph with 1000 vertices 
for different values of A. 
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Barabasi-Albert model We now generate a random graph with preferential attachment. Start 
from a clique with 5 vertices and at each step add one new vertex v and two edges (v,W\) and 
(■ v,W 2 ), where w\ and W 2 are chosen at random with probability proportional to their degree. 
Figure [5] compares the average number of events generated for 100 experiments with the three 
samplers, for graphs with 100 vertices. 
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Figure 5: Number of events generated by CFTP algortihms for the Barabasi-Albert model with 
100 vertices for different values of A. 

Similarly to the star graph, Dyer-Greenhill and oracle skipping samplers outperform the Gibbs 
sampler, and the oracle skipping sample is sensitively better than the Dyer-Greenhill one. For 
large values, those two samplers are not sensitive to A (or slightly improve when A grows). 
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5 Conclusions 

The main contribution of the paper is Algorithm 3, that speeds up the Markovian dynamics in 
the CFTP scheme for exact sampling from the stationary distribution of a Markov chain. We 
illustrated it here on the problem of randomly generating independent sets, but its applicability 
is much broader, within the context of the random generation of combinatorial objects using 
Glauber dynamics, or elsewhere. The application to the simulation of queueing networks is an 
ongoing research. 
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